Audio-visual speech asynchrony modeling in a talking head

نویسندگان

  • Alexey Karpov
  • Liliya Tsirulnik
  • Zdenek Krnoul
  • Andrey Ronzhin
  • Boris Lobanov
  • Milos Zelezný
چکیده

An audio-visual speech synthesis system with modeling of asynchrony between auditory and visual speech modalities is proposed in the paper. Corpus-based study of real recordings gave us the required data for understanding the problem of modalities asynchrony that is partially caused by the coarticulation phenomena. A set of context-dependent timing rules and recommendations was elaborated in order to make a synchronization of auditory and visual speech cues of the animated talking head similar to a natural humanlike way. The cognitive evaluation of the model-based talking head for Russian with implementation of the original asynchrony model has shown high intelligibility and naturalness of audio-visual synthesized speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchrony modeling for audio-visual speech recognition

We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...

متن کامل

Audio-visual anticipatory coarticulation modeling by human and machine

The phenomenon of anticipatory coarticulation provides a basis for the observed asynchrony between the acoustic and visual onsets of phones in certain linguistic contexts. This type of asynchrony is typically not explicitly modeled in audio-visual speech models. In this work, we study within-word audiovisual asynchrony using manual labels of words in which theory suggests that audio-visual asyn...

متن کامل

Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition

The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based spee...

متن کامل

The Development of a Brazilian Talking Head

This paper describes partial results of a research, in progress at the School of Electrical and Computer Engineering of the State University of Campinas, aimed at developing a realistic three-dimensional Brazilian Talking Head. Through an extensive analysis of a video-audio linguistic corpus, a set of 29 phonetic context-dependent visemes (22 consonantal plus 7 vocalic visemes), that accommodat...

متن کامل

Recent Advances in the Automatic Recognition of Audio-Visual Speech

Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009